Falco: a quick and flexible single-cell RNA-seq processing framework on the cloud

نویسندگان

  • Andrian Yang
  • Michael Troup
  • Peijie Lin
  • Joshua W. K. Ho
چکیده

Summary Single-cell RNA-seq (scRNA-seq) is increasingly used in a range of biomedical studies. Nonetheless, current RNA-seq analysis tools are not specifically designed to efficiently process scRNA-seq data due to their limited scalability. Here we introduce Falco, a cloud-based framework to enable paralellization of existing RNA-seq processing pipelines using big data technologies of Apache Hadoop and Apache Spark for performing massively parallel analysis of large scale transcriptomic data. Using two public scRNA-seq datasets and two popular RNA-seq alignment/feature quantification pipelines, we show that the same processing pipeline runs 2.6-145.4 times faster using Falco than running on a highly optimized standalone computer. Falco also allows users to utilize low-cost spot instances of Amazon Web Services, providing a ∼65% reduction in cost of analysis. Availability and Implementation Falco is available via a GNU General Public License at https://github.com/VCCRI/Falco/. Contact [email protected]. Supplementary information Supplementary data are available at Bioinformatics online.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

I-13: Transcriptome Dynamics of Human and Mouse Preimplantation Embryos Revealed by Single Cell RNA-Sequencing

Background: Mammalian preimplantation development is a complex process involving dramatic changes in the transcriptional architecture. However, it is still unclear about the crucial transcriptional network and key hub genes that regulate the proceeding of preimplantation embryos. Materials and Methods: Through single-cell RNAsequencing (RNA-seq) of both human and mouse preimplantation embryos, ...

متن کامل

A Graph-Based Clustering Approach to Identify Cell Populations in Single-Cell RNA Sequencing Data

Introduction: The emergence of single-cell RNA-sequencing (scRNA-seq) technology has provided new information about the structure of cells, and provided data with very high resolution of the expression of different genes for each cell at a single time. One of the main uses of scRNA-seq is data clustering based on expressed genes, which sometimes leads to the detection of rare cell populations. ...

متن کامل

A Graph-Based Clustering Approach to Identify Cell Populations in Single-Cell RNA Sequencing Data

Introduction: The emergence of single-cell RNA-sequencing (scRNA-seq) technology has provided new information about the structure of cells, and provided data with very high resolution of the expression of different genes for each cell at a single time. One of the main uses of scRNA-seq is data clustering based on expressed genes, which sometimes leads to the detection of rare cell populations. ...

متن کامل

An Efficient Resource Allocation for Processing Healthcare Data in the Cloud Computing Environment

Nowadays, processing large-media healthcare data in the cloud has become an effective way of satisfying the medical userschr('39') QoS (quality of service) demands. Providing healthcare for the community is a complex activity that relies heavily on information processing. Such processing can be very costly for organizations. However, processing healthcare data in cloud has become an effective s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 33 5  شماره 

صفحات  -

تاریخ انتشار 2017